532 research outputs found

    Learning the structure of Bayesian Networks: A quantitative assessment of the effect of different algorithmic schemes

    Full text link
    One of the most challenging tasks when adopting Bayesian Networks (BNs) is the one of learning their structure from data. This task is complicated by the huge search space of possible solutions, and by the fact that the problem is NP-hard. Hence, full enumeration of all the possible solutions is not always feasible and approximations are often required. However, to the best of our knowledge, a quantitative analysis of the performance and characteristics of the different heuristics to solve this problem has never been done before. For this reason, in this work, we provide a detailed comparison of many different state-of-the-arts methods for structural learning on simulated data considering both BNs with discrete and continuous variables, and with different rates of noise in the data. In particular, we investigate the performance of different widespread scores and algorithmic approaches proposed for the inference and the statistical pitfalls within them

    Augmenting data warehousing architectures with Hadoop

    Get PDF
    As the volume of available data increases exponentially, traditional data warehouses struggle to transform this data into actionable knowledge. This study explores the potentialities of Hadoop as a data transformation tool in the setting of a traditional data warehouse environment. Hadoop’s distributed parallel execution model and horizontal scalability offer great capabilities when the amounts of data to be processed require the infrastructure to expand. Through a typification of the SQL statements, responsible for the data transformation processes, we were able to understand that Hadoop, and its distributed processing model, delivers outstanding performance results associated with the analytical layer, namely in the aggregation of large data sets. We demonstrate, empirically, the performance gains that can be extracted from Hadoop, in comparison to a Relational Database Management System, regarding speed, storage usage, and scalability potential, and suggest how this can be used to evolve data warehouses into the age of Big Data

    A course-agnostic approach

    Get PDF
    Santos, R. M. C., & Henriques, R. (2023). Predicting student performance from Moodle logs in higher education: A course-agnostic approach. In M. Carmo (Ed.), Education and New Developments 2023 (Vol. 2, pp. 77-81). Science Press. https://end-educationconference.org/wp-content/uploads/2023/06/Education-and-New-Developments_2023_Vol_II.pdfThe institutional adoption of learning management systems (LMS) aims to improve educational outcomes and reduce churn through student engagement with educational content. Modern LMS record all student interactions and store them as activity logs that encode patterns of learning behaviour. Previous research has shown that insights derived from log data can detect students at risk of failing in a single or a few courses, but comprehensive institution-wide surveys are few and far between. The work presented herein uses machine learning to create predictive models to identify students at risk or excellent students using the Moodle logs generated by a sample of 9296 course enrollments at a Portuguese information management school. 31 candidate features were extracted to create and train different predictive models. Model performance was evaluated through 30 repetitions of Stratified K-Fold Cross-Validation, using the area under the receiver operating characteristic (ROC) curve (AUC) and the F1-score. All experiments were repeated with the addition of the average of the intermediate grades obtained by the student in the course as a 32nd candidate feature. The results suggest that features extracted from Moodle logs are good predictors of students at risk, as indicated by the 0.752 AUC score achieved by Random Forest. The addition of intermediate grades significantly improves the predictive performance, leading to an AUC score of 0.922 and F1-Score of 0.693 for the best classifier, Gradient Boosting. However, the performance for identifying excelling students was comparatively lower, with an AUC score of 0.781 and F1-Score of 0.567 for Gradient Boosting. Future work should focus on exploring the implementation of an early warning system that can assist educators in identifying students in need while there is still time to provide feedback and develop corrective measures.publishersversionpublishe

    Artificial Intelligence in geospatial analysis: applications of self-organizing maps in the context of geographic information science.

    Get PDF
    A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Geographic Information SystemsThe size and dimensionality of available geospatial repositories increases every day, placing additional pressure on existing analysis tools, as they are expected to extract more knowledge from these databases. Most of these tools were created in a data poor environment and thus rarely address concerns of efficiency, dimensionality and automatic exploration. In addition, traditional statistical techniques present several assumptions that are not realistic in the geospatial data domain. An example of this is the statistical independence between observations required by most classical statistics methods, which conflicts with the well-known spatial dependence that exists in geospatial data. Artificial intelligence and data mining methods constitute an alternative to explore and extract knowledge from geospatial data, which is less assumption dependent. In this thesis, we study the possible adaptation of existing general-purpose data mining tools to geospatial data analysis. The characteristics of geospatial datasets seems to be similar in many ways with other aspatial datasets for which several data mining tools have been used with success in the detection of patterns and relations. It seems, however that GIS-minded analysis and objectives require more than the results provided by these general tools and adaptations to meet the geographical information scientist‟s requirements are needed. Thus, we propose several geospatial applications based on a well-known data mining method, the self-organizing map (SOM), and analyse the adaptations required in each application to fulfil those objectives and needs. Three main fields of GIScience are covered in this thesis: cartographic representation; spatial clustering and knowledge discovery; and location optimization.(...

    A novel evaluation framework for recommender systems in big data environments

    Get PDF
    Henriques, R., & Pinto, L. (2023). A novel evaluation framework for recommender systems in big data environments. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2023.120659---We gratefully acknowledge the support of Aptoide in providing access to the data which made this project possible. This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project—UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS.Recommender systems were first introduced to solve information overload problems in enterprises. Over the last few decades, recommender systems have found applications in several major websites related to e-commerce, music and video streaming, travel and movie sites, social media, and mobile app stores. Several methods have been proposed over the years to build recommender systems. However, very little work has been done in recommender system evaluation metrics. The most common approach to measuring recommender system’s performance in offline settings is to employ micro or macro averaged versions of standard machine-learning measures. Profit or other business-oriented metrics have been proposed for other predictive analytics problems, such as churn prediction. However, no such metrics have emerged for the recommender system context. In this work, we propose a novel evaluation metric that incorporates information from the online-platform userbase’s behavior. This metric’s rationale is that the recommender system ought to improve customers’ repeatead use of an online platform beyond the baseline level (i.e. in the absence of a recommender system). An empirical application of this novel metric is also presented in a real-world mobile app store, which integrates the dynamics of large-scale big data environments, which are common deployment scenarios for these types of recommender systems. The resulting profit metric is shown to correlate with the existing metrics while also being capable of integrating cost information, thereby providing an additional business benefit context, which allows us to differentiate between two similarly performing models.publishersversionepub_ahead_of_prin

    a room affecting what we do and how we feel

    Get PDF
    Victorino, G., & Henriques, R. (2021). Design of learning environments: a room affecting what we do and how we feel. In EDULEARN21 Proceedings: 13th International Conference on Education and New Learning Technologies (pp. 10849-10859) https://doi.org/10.21125/edulearn.2021.2256The conversion of traditional classrooms into new innovative learning environments (ILE) has been increasingly investigated and implemented in many schools, largely due to societal and technological developments (French, Imms, & Mahat, 2020). Higher Education Institutions are no exception. The design of learning environments to support the development of technology-enhanced learning, centred on students and pedagogic theory, has also been studied (Laurillard et al., 2013; Zitter, De Bruijn, Simons, & Cate, 2011). These learning spaces are generally technologically rich spaces, with different screens for visualization, and a spatial configuration aiming to promote collaboration (Mei & May, 2018), nevertheless, attempts to incorporate active learning pedagogies in spaces that aren't tuned in to the needs of active learning have yielded suboptimal outcomes and a lot of dissatisfaction for both teachers and students (Talbert & Mor-Avi, 2019). In this paper, we study the relation between built environments with wellbeing in mind and its use in an innovative learning space. Following the work of Dolan et al. (2016) we implement the SALIENT checklist in a prototype classroom at NOVA University. The SALIENT checklist recognizes that behaviour is context-dependent and consists of seven dimensions to be considered in the design of environments with wellbeing in mind: 1) Sound, 2) Air, 3) Light, 4) Image, 5) Ergonomics, 6) Nature and 7) Tint. These seven dimensions can have an impact on the learning process, and we hypothesize that a space considering the SALIENT checklist will allow for better students’ performances and satisfaction. We conducted qualitative research using a design thinking approach (Brown & Wyatt, 2010) to better understand how to implement the SALIENT checklist in the context of education and what alternatives were more adapted to active learning. We promoted two design-thinking workshops involving students and professors to propose design ideas for the learning environment. Through these design-thinking workshops, students and teachers reflected on the implementation of each dimension of SALIENT and discussed its role and possible impact on the integration of new pedagogical strategies.authorsversionpublishe

    Using emotional and non-emotional measures

    Get PDF
    Elbawab, M., & Henriques, R. (2023). Machine Learning applied to student attentiveness detection: Using emotional and non-emotional measures. Education and Information Technologies, 1-21. https://doi.org/10.1007/s10639-023-11814-5 --- Open access funding provided by FCT|FCCN (b-on). This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project—UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS. Fundação para a Ciência e a Tecnologia,UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS, Roberto Henriques.Electronic learning (e-learning) is considered the new norm of learning. One of the significant drawbacks of e-learning in comparison to the traditional classroom is that teachers cannot monitor the students' attentiveness. Previous literature used physical facial features or emotional states in detecting attentiveness. Other studies proposed combining physical and emotional facial features; however, a mixed model that only used a webcam was not tested. The study objective is to develop a machine learning (ML) model that automatically estimates students' attentiveness during e-learning classes using only a webcam. The model would help in evaluating teaching methods for e-learning. This study collected videos from seven students. The webcam of personal computers is used to obtain a video, from which we build a feature set that characterizes a student's physical and emotional state based on their face. This characterization includes eye aspect ratio (EAR), Yawn aspect ratio (YAR), head pose, and emotional states. A total of eleven variables are used in the training and validation of the model. ML algorithms are used to estimate individual students' attention levels. The ML models tested are decision trees, random forests, support vector machines (SVM), and extreme gradient boosting (XGBoost). Human observers' estimation of attention level is used as a reference. Our best attention classifier is the XGBoost, which achieved an average accuracy of 80.52%, with an AUROC OVR of 92.12%. The results indicate that a combination of emotional and non-emotional measures can generate a classifier with an accuracy comparable to other attentiveness studies. The study would also help assess the e-learning lectures through students' attentiveness. Hence will assist in developing the e-learning lectures by generating an attentiveness report for the tested lecture.publishersversionepub_ahead_of_prin

    Educational Data Mining to Predict Bachelors Students’ Success

    Get PDF
    Predicting academic success is essential in higher education because it is perceived as a critical driver for scientific and technological advancement and countries’ economic and social development. This paper aims to retrieve the most relevant attributes for academic success by applying educational data mining (EDM) techniques to a Portuguese business school bachelor’s historical data. We propose two predictive models to classify each student regarding academic success at enrolment and the end of the first academic year. We implemented a SEMMA methodology and tried several machine learning algorithms, including decision trees, KNN, neural networks, and SVM. The best classifier for academic success at the entry-level reached is a random forest with an accuracy of 69%. At the end of the first academic year, an MLP artificial neural network’s best performance was achieved with an accuracy of 85%. The main findings show that at enrolment or the end of the first year, the grades and, thus, the student’s previous education and engagement with the school environment are decisive in achieving academic success. Doi: 10.28991/ESJ-2023-SIED2-013 Full Text: PD

    Flight delays and associated factors, Hartsfield-Jackson Atlanta international airport

    Get PDF
    CENTERIS 2018 - International Conference on ENTERprise Information Systems / ProjMAN 2018 - International Conference on Project MANagement / HCist 2018 - International Conference on Health and Social Care Information Systems and Technologies, CENTERIS/ProjMAN/HCist 2018Nowadays, a downside to traveling is the delays that are constantly being advertised to passengers resulting in a decrease in customer satisfaction and causing costs. Consequently, there is a need to anticipate and mitigate the existence of delays helping airlines and airports improving their performance or even take consumer-oriented measures that can undo or attenuate the effect that these delays have on their passengers. This study has as main objective to predict the occurrence of delays in arrivals at the international airport of Hartsfield-Jackson. A Knowledge Discovery Database (KDD) methodology was followed, and several Data Mining techniques were applied. Historical data of the flight and weather, information of the airplane and propagation of the delay were gathered to train the model. To overcome the problem of unbalanced datasets, we applied different sampling techniques. To predict delays in individual flights we used Decision Trees, Random Forest and Multilayer Perceptron. Finally, each model's performance was evaluated and compared. The best model proved to be the Multilayer Perceptron with 85% of accuracy.publishersversionpublishe
    • …
    corecore